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1 .  INTRODUCTION 


Experimental  studies  involving  a  large  number  of  factors  can  re¬ 
quire  a  prohibitively  large  and  costly  research  program.  Often  it  is 
anticipated  that  only  a  small  subset  of  the  factors  is  important  in 
explaining  the  response.  Accordingly,  we  may  want  to  conduct  a  pre¬ 
liminary  screening  experiment  to  determine  the  subset  of  "most  impor¬ 
tant"  factors.  Such  experiments  are  not  an  end  in  themselves  but  are 
performed  as  an  initial  phase  of  experimentation.  Once  the  most  influ¬ 
ential  factors  have  been  isolated,  future  experimentation  can  then  in¬ 
vestigate  these  factors  in  detail.  By  reducing  the  size  of  the  experi¬ 
mental  problem  at  the  screening  stage,  we  are  able  to  conserve  resources 
and  more  efficiently  and  effectively  study  the  factors  of  interest. 
Screening  experiments  have  potential  application  in  many  research  areas 
such  as  manufacturing,  engineering,  product  development,  and  simulation. 

The  factor  screening  problem  has  been  considered  by  a  number  of 
authors;  see,  for  instance,  Anscombe  (1963),  Booth  and  Cox  (1962), 

Budne  (1959b),  Kleijnen  (1975),  and  Satterthwaite  (1959).  However, 
there  has  been  no  objective  evaluation  and  comparison  of  the  available 
screening  methods.  Kleijnen  (1975)  and  Smith  and  Mauro  (1984)  have 
divided  the  screening  problem  into  two  general  situations.  These  are 
the  unsaturated/saturated  and  supersaturated  situations.  In  the  unsatu¬ 
rated/saturated  situation,  one  can  afford  to  invest  more  runs  than  there 
are  factors.  Designs  that  have  been  generally  recommended  for  use  in 
this  situation  include  Plackett-Burman  designs  (Plackett  and  Burman  1946) 
and  resolution  IV  foldover  designs  (Box,  Hunter,  and  Hunter  1978). 


These  designs  have  been  extensively  studied  and  their  properties  are 
well  known. 

In  the  supersaturated  situation,  the  number  of  factors  equals  or 
exceeds  the  number  of  runs  available  for  screening.  Designs  satisfying 
this  limitation  which  have  been  suggested  include  group  screening  de¬ 
signs  (Li  1962;  Patel  1962;  Watson  1961),  random  designs  (Satterthwaite 
1959),  and  systematic  supersaturated  designs  (Booth  and  Cox  1962). 

The  performance  characteristics  of  these  designs  are  largely  unknown. 
Furthermore,  there  are  few  examples  of  supersaturated  screening  experi¬ 
ments  in  the  literature.  For  the  researcher  contemplating  a  supersatu¬ 
rated  experiment,  it  is  therefore  difficult  to  find  practical  guidelines 
for  either  design  or  analysis. 

In  this  paper  we  compare  the  performance  of  random  balance  (RB) 
and  two-stage  group  screening  (GS)  designs  in  a  case  study  in  which 
K-100  factors  are  screened  in  N“20,42,62,  and  84  runs.  In  addition,  we 
discuss  the  relative  merits  and  demerits  of  each  approach.  This  dis¬ 
cussion  should  provide  some  practical  insight  into  the  selection  and 
use  of  these  designs.  We  have  not  included  systematic  supersaturated 
designs  in  our  study  since  these  designs  have  not  been  tabulated  for 
K>36  and  there  is  no  efficient  algorithm  for  their  general  construction. 
Booth  and  Cox  (1962)  have  shown  that  these  designs  are  in  general  more 
efficient  than  RB  designs. 

As  an  underlying  screening  model,  we  will  assume  the  model  defined 
in  Section  2.  In  Sections  3  and  4  we  describe  and  discuss  the  RB 
and  GS  strategies  which  we  consider.  In  Section  5  we  present  and  discuss 
the  results  of  our  case  study.  A  brief  summary  follows  in  Section  6. 


2.  A  SCREENING  MODEL 


It  generally  suffices  in  screening  problems  to  employ  two  levels 
of  each  factor;  see,  for  example,  Montgomery  (1979;  p.  5)  and  Box, 
Hunter,  and  Hunter  (1978;  pp.  306-307).  Furthermore,  for  the  purpose 
of  detecting  the  factors  which  have  a  major  effect  it  is  usually  rea¬ 
sonable  to  assume  the  first-order  model 
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(2.1) 


where  y^  is  the  i —  observation,  8q  is  a  constant  component  common  to 
all  observations,  K  is  the  number  of  factors,  Bj  is  the  linear  effect 
of  the  j—  factor,  x^  is  the  level  (coded  ±1)  of  the  j-^  factor  in 

x.  V 

the  i—  run,  and  the  ci  are  i.i.d.  N(0,o*)  error  terms,  unknown.  Or¬ 
dinarily,  we  would  use  model  (2.1)  over  a  relatively  small  region  of  the 
factor  space. 

In  this  paper  we  restrict  our  comparisons  of  screening  strategies 


to  model  (2.1).  Moreover,  we  will  use  this  model  as  a  basis  for  perfor¬ 
mance  assessment  and  data  generation  in  our  case  study. 


3.  RANDOM  BALANCE  DESIGNS 


In  a  two-level  (±1)  RB  design,  each  column  of  the  design  matrix 
consists  of  N/2  +l's  and  N/2  -l's  where  N,  an  even  number,  denotes  the 
total  number  of  runs  to  be  made.  The  +l's  and  -l's  in  each  column  are 
assigned  randomly,  making  all  possible  combinations  of  N/2  +1' s  and 
N/2  -l's  equally  likely,  with  each  column  receiving  an  independent 
randomization. 

The  principal  advantages  of  RB  sampling  for  use  in  screening  prob¬ 
lems  are  its  flexibility  and  the  ease  with  which  we  can  prepare  these 
designs.  The  number  of  runs  N  can  be  selected  independently  of  the  num¬ 
ber  of  factors  K;  no  mathematical  restriction  or  relationship  (except 
that  N  be  even)  need  exist  between  N  and  K. 

There  are  two  primary  disadvantages  to  RB  sampling.  The  first  of 
these  is  that  confounding  is  random.  Anscombe  (1959;  p.  201)  has  written, 
"The  fact  that  the  degree  of  nonorthogonality  or  unbalance  is  random  can 
be  made  the  basis  for  an  objection  to  the  whole  notion  of  random  balance 
designs.  Such  designs  may  work  well  on  the  average,  but  should  I  trust 
to  one  on  this  occasion?"  The  second  disadvantage,  which  is  closely  re¬ 
lated  to  the  first,  is  that  there  is  no  generally  accepted  method  of 
analysis  for  RB  designs.  The  simplest  approach,  and  the  one  adopted  in 
this  paper,  is  to  consider  each  factor  separately,  ignoring  all  other 
factors,  and  apply  a  standard  F-test.  More  sophisticated  analysis  methods 
which  have  been  used  include  least  squares  stepwise  and  stagewise  regres¬ 
sion.  We  refer  the  reader  to  Youden,  et.al.  (1959)  for  a  more  complete 
discussion  of  RB  experimentation. 
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As  we  have  already  Indicated,  we  consider  a  standard  F-test  applied 
separately  to  each  factor  as  the  method  of  analysis  for  RB  designs.  We 
assume  for  simplicity  that  each  F-test  is  conducted  at  the  same  level  of 
significance,  say  ct^.  We  denote  such  a  strategy  by  RB(N,  “r) .  Further¬ 
more,  for  screening  purposes,  we  classify  a  factor  as  important  only  if 
its  associated  F-ratio  is  significant,  i.e.,  equals  or  exceeds  the  upper 
100(1-01^)  percentage  point  of  an  F-distribution  with  (l,N-2)  degrees  of 
freedom. 

The  simple  least  squares  estimator  of  obtained  by  ignoring  all 
other  factors  is  given  by 


-  <5,j  - 


(3.1) 


where  y+^ (y  is  the  average  value  of  the  response  over  the  N/2  runs 
at  the  +1(-1)  level  of  the  j—  factor.  To  simplify  notation,  we  let 
^  denote  the  Nxl  vector  (y^ »y2» • • • »YN) '  of  responses  and  x^  denote  the 
Nxl  vector  (x^ .x^j , . . . ,x^j ) ' .  In  an  RB  experiment,  the  NxK  design  ma¬ 
trix  X*t  2Li».*2 . is>  by  construction,  stochastic.  Assuming  that  X 

and  the  are  independent,  it  is  easily  shown  that  conditional  on  X, 


E(B  |X)  ■  8 .  +  (  E  B.x.'x.)/N 
J  3  iffj  1  1  J 


(3.2) 


V(B  |X)  *  o*/N. 


The  conditional  mean  square  error  (MSE)  of  Bj  is  then 


(3.3) 
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(3.4) 


MSE(0. |X)  =  CJ2/N  +  (  E  6.x!x,)2/N2 
j  ~  C  i*j  i_i_^ 


Unconditionally,  Mauro  and  Smith  (1984)  have  shown,  see  also  Box  (1959), 
that 


and 


K 
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(3.5) 

(3.6) 


As  Box  (1959)  points  out,  equations  (3.2)  and  (3.3)  refer  to  the 
behavior  of  the  estimates  for  repetitions  of  a  particular  RB  design. 
Equations  (3.5)  and  (3.6),  on  the  other  hand,  refer  to  the  behavior  of 
the  estimates  if  we  average  over  the  random  choice  of  RB  designs.  Box 
(1959)  comments  further  that  although  6^  is  unconditionally  unbiased, 
the  effect  of  the  conditional  bias  term  in  (3.2)  is  transferred  to  the 
(unconditional)  variance  of  which  now  contains  terms  from  every  other 
factor  present. 


4.  TWO-STAGE  GROUP  SCREENING  DESIGNS 


In  the  two-stage  group  screening  method,  introduced  by  Watson  (1961), 
the  individual  factors  (each  at  two  levels)  are  partitioned  into  groups. 

By  assigning  the  same  level  to  all  component  factors,  the  groups  are 
tested  (in  a  first-stage  experiment)  as  if  they  were  single  factors. 

Those  factors  within  significant  groups  are  subsequently  tested  individ¬ 
ually  in  a  follow-up  second-stage  experiment.  The  basic  idea  behind 
this  method  is  that  factors  within  a  group  are  completely  confounded. 

Thus,  after  the  first  stage,  we  can  eliminate  from  further  consideration 
all  factors  within  non-significant  groups.  The  fewer  the  important  fac¬ 
tors,  the  more  effective  is  the  technique. 

To  study  the  group-factors  in  the  first  stage  and  the  individual 
factors  which  reach  the  second  stage,  we  will  use  the  resolution  III 
multifactorial  designs  of  Plackett  and  Burman  (PB) .  These  designs  are 
specially  constructed  two-level  orthogonal  designs  for  studying  up  to 
(4m-l)  factors  in  4m  runs.  Mathematically,  the  number  of  runs  required 
by  the  smallest  PB  design  to  study  S  factors  (or  group-factors)  is  given 
by 


B(S)  =  S+4  -  S(mod  4). 

However,  in  order  to  insure  at  least  one  degree  of  freedom  for  error, 
we  shall  employ  the  PB  design  in  B(S+1)  runs.  Since  we  are  assuming  an 
underlying  first-order  screening  model,  the  use  of  PB  designs  would  seem 
reasonable.  We  analyze  these  designs  with  the  usual  analysis  of  variance 
procedures  for  factorial  experiments. 
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We  make  Che  addiCional  assumption  ChaC  the  K  factors  are  partitioned 
randomly  into  G  groups  of  size  g;  if  K  is  not  a  multiple  of  g,  we  will 
assume  that  the  group  sizes  are  taken  as  "evenly"  as  possible.  The  assump¬ 
tions  of  equal  group  sizes  and  random  allocation  to  groups  are  appropriate 
if  there  is  no  prior  knowledge  about  the  effects.  Of  course,  if  the  ex¬ 
perimenter  has  prior  information  indicating  that  some  effects  are  larger 
than  others,  then  it  is  better  to  group  those  effects  together.  Moreover, 
the  experimenter  is  not  limited  in  practice  to  a  constant  group  size. 

Watson  (1961)  has  discussed  the  device  of  using  different  grouj  rzes 
when  prior  probabilities  differ. 

An  important  advantage  of  GS  designs  is  that  we  can  to  so  <  ent 
control  the  confounding  pattern.  There  are  two  disadvantages  to  GS  designs. 
The  first  of  these  is  that  the  total  number  of  runs  required  in  a  group 
screening  experiment  is  random.  Specifically,  the  number  of  second-stage 
runs  depends  on  the  number  of  significant  groups,  which  is  subject  to 
testing  and  grouping  variation.  Thus,  unlike  in  an  RB  design,  the  total 
number  of  runs  required  in  a  GS  design  is  not  fixed  prior  to  experimenta¬ 
tion.  The  second  disadvantage  is  that  effects  may  cancel  within  a  group. 

As  a  simple  example,  consider  two  factors  which  have  effects  that  are 
negatives  or  near-negatives  of  each  other.  If  these  two  factors  are  the 
only  important  factors  in  a  given  group,  their  effects  will  cancel  or 
their  combined  effect  may  be  masked  by  experimental  error.  It  is  desir¬ 
able,  then,  to  have  prior  knowledge  of  the  directions  of  all  suspected 
effects.  With  this  information,  factor  levels  can  be  assigned  so  that 
all  potential  effects  are  in  the  same  direction. 
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In  practice,  the  direction  will  most  likely  be  known  for  some  sus¬ 
pected  effects  and  unknown  for  others.  Kleijnen  (1975;  p.  489)  has 
pointed  out  that  "unequal  groups  sizes  make  it  possible  to  test  a  fac¬ 
tor  individually  when  we  do  not  know  the  direction  of  its  effect." 
However,  it  would  seem  feasible  to  treat  only  a  few  suspected  effects  in 
this  manner.  It  seems  certain  that  in  some  applications  cancellation 
may  not  be  avoided.  The  effect  of  cancellation  has  been  studied  under 
"worst-case"  assumptions  by  Mauro  and  Smith  (1982)  and  Mauro  (1983a). 

Finally,  we  let  and  denote  the  levels  of  the  significance 
tests  performed  at  the  end  of  the  first  and  second  stages,  respectively. 
Since  we  assume  a  constant  group  size  g  in  addition  to  random  allocation, 
our  GS  strategy  is  completely  specified  by  g,  a^,  and  a^.  We  denote 
such  a  strategy  by  GS(g,  c^,  a^) . 


i 
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5.  A  CASE  STUDY 


This  section  describes  a  case  study  in  which  we  compare  RB(N,ot^) 
strategies  for  testing  K*»100  factors  in  N=20,42,62,  and  84  runs  with 
corresponding  GS(g,  a ^ ,  a^)  strategies  in  as  many  expected  runs.  In 
addition,  for  each  N,  we  consider  type  I  error  rates  of  5%  and  10%. 

Power  curves  are  used  as  a  basis  of  comparison. 

In  our  case  study,  we  assume  model  (2.1)  together  with  the  re¬ 
gression  coefficients  given  in  Table  1.  The  absolute  magnitudes  of 
the  effects  listed  in  Table  1  correspond  to  the  expected  order  statis¬ 
tics  (rounded  to  two  decimal  places)  of  a  sample  of  size  100  from  a 
gamma  distribution  with  mean  .50^  and  standard  deviation  1.50^,  see 
Figure  1.  These  particular  effects  seem  reasonable  as  an  illustrative 
screening  example  and  are  in  accordance  with  the  "mal-distribution" 
assumption  discussed  by  Budne  (1959a).  Furthermore,  since  it  is  unlikely 
that  all  effect  directions  will  be  known  a  priori,  we  have  allowed  some 
factors  to  have  negative  effects.  The  proportion  of  negative  effects 
has  been  made  a  decreasing  function  of  their  absolute  magnitudes,  since 
that  scenario  seems  most  likely  to  occur  in  practice. 

5.1  RB  Power  Curves 


In  order  to  evaluate  the  power  of  an  RB(N,ctr)  strategy,  we  adopt 
the  method  of  averaging  over  the  set  of  possible  RB(N,cir)  designs. 
(Dempster  1960  has  discussed  the  ramifications  of  this  approach.)  In 
the  appendix,  we  show  that  the  RB  model  can  be  related  to  the  exchangeable 
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linear  model,  as  discussed  by  Arnold  (1979,  1981).  Using  results  derived 
for  the  exchangeable  linear  model,  we  obtain  expressions  which  can  be 
used  to  determine  power  probabilities  in  an  RBCN.a^)  strategy.  The  power 
approximations  outlined  in  the  appendix  are  an  improvement  over  those 
given  in  Mauro  and  Smith  (1984).  It  is  our  experience  that  these  ap¬ 
proximations  are  very  good  even  for  small  values  of  N.  We  make  the  addi¬ 
tional  observation  that  the  type  I  error  rate  in  an  RB(N,af)  strategy  is 
very  closely  approximated  by  a^. 

Figures  2a-2d  contain  the  power  curves  corresponding  to  the  eight 
RB(N,ar)  strategies  specified  by  N=20,42,62,  and  84  runs  with  a^-0.05 
and  0.10.  On  each  of  these  plots,  the  solid  curve  corresponds  to  ar**0.05 
and  the  dashed  curve  corresponds  to  ar“0.10. 

5.2  GS  Power  Curves 


Mauro  (1983a)  has  developed  formulas  to  determine  power  probabilities 
in  a  GS(g,  c^)  strategy  for  the  special  case  in  which  each  individual 
factor  has  an  effect  of  size  +Ao_,  -Aa^,  or  0.  Unfortunately,  the  more 
general  case  of  arbitrary  effects  is  too  complex  to  analyze  mathematically. 
For  our  case  study,  therefore,  it  was  necessary  that  we  use  a  Monte  Carlo 
simulation  program  to  estimate  power  probabilities  in  a  GS(g,  ot^,  c^) 
strategy.  Using  this  program  we  determined,  by  trial  and  error,  the  par¬ 
ticular  GS(g,  otj,  c^)  strategies  which  maximized  power  subject  to  the 
specified  constraints  on  the  expected  number  of  runs,  E(N),  and  type  I 
error. 

The  "optimal"  GS(g,  strategies  that  we  selected  are  re- 
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ported  in  Table  2.  In  this  table  we  also  have  given  the  sample  mean  N 

and  standard  deviation  S„  associated  with  each  GS  strategy.  The  results 

N 

are  based  on  10,000  simulations  (except  for  one  case  noted  in  the  table). 

It  is  important  to  note  that  in  a  GS(g,  ot^,  ot^)  strategy,  the  ex¬ 
pected  number  of  runs  depends  only  on  g  and  a^.  Furthermore,  given  g 
and  a^,  type  I  error  can  be  expressed  as 

Type  I  Error  *  CCg.o^)*^. 

Thus,  type  I  error  is  directly  proportional  to  an<*  bounded  by  the  con¬ 
stant  CCg.dj). 

Figures  3a-3d  contain  the  empirical  power  curves  corresponding  to 
the  selected  GS(g,  o^,  a2>  strategies  with  5%  type  I  error.  In  these 
plots,  the  solid  curve  is  the  power  associated  with  the  positive  effects. 
Since  the  power  associated  with  a  negative  effect  is  generally  less  than 
that  for  a  positive  effect  of  the  same  magnitude,  we  have  marked  the  neg¬ 
ative  effects  separately.  This  phenomenon  is  due  to  cancellation  and 
will  be  discussed  later. 

We  do  not  present  the  GS  power  curves  for  10%  type  I  error  since 
these  curves  were  virtually  identical,  except  for  effects  of  small  mag¬ 
nitude  (less  than  .50^),  with  the  curves  plotted  in  Figure  3.  Apparently, 
the  levels  of  &2  *n  GS(g,  a^,  a 2>  strategies  reported  in  Table  2 
were  sufficiently  large  to  permit  detection  of  even  moderately  sized 
effects  for  both  5%  and  10%  type  I  errors. 


5.3  Discussion 


It  is  convenient  to  define  relative  testing  cost  (RTC)  as  the 
ratio  of  the  number  (or  expected  number)  of  runs  required  by  a 
screening  strategy  to  B(K+1),  which  is  the  number  of  runs  required  by 
a  PB  design  for  K  factors.  A  quick  calculation  will  show  that  for 
K=100  factors,  run  numbers  of  20,42,62,  and  84  correspond  roughly  to 
relative  testing  costs  of  20%,  40%,  60%,  and  80%,  respectively. 

A  comparison  of  Figures  2a-2d  with  3a-3d  shows  that  the  GS  power 
curves  are  clearly  superior  to  the  corresponding  RB  power  curves  for 
60%  and  80%  RTC.  At  40%  RTC,  the  GS  and  RB  power  curves  are  fairly 
comparable,  although  the  RB  strategy  has  slightly  greater  power  for 
large  effects  while  the  GS  strategy  has  more  power  to  detect  small 
effects.  For  5%  type  I  error,  the  curves  cross  at  about  4.30^.,  at 
which  power  is  roughly  50%.  A  comparison  of  the  20%  RTC  power  curves 
shows  that  RB  is  the  superior  method  here.  In  this  case,  the  selected 
GS  strategy  has  only  a  25%  chance  to  detect  the  largest  effect  (9.570^) 
in  the  model.  On  the  other  hand,  the  RB(20,0.05)  strategy  has  roughly 
a  25%  chance  to  detect  an  effect  of  size  4.5a_,  a  50%  chance  to  detect 
an  effect  of  size  60^,  and  a  95%  chance  to  detect  the  largest  effect. 

As  indicated  by  the  uniformly  low  power  attained  at  20%  RTC, 
group  screening  designs  are  not  well  suited  for  use  when  there  are 
severe  limitations  on  the  number  of  runs.  To  screen  100  factors  in  20 
runs,  for  example,  we  note  that  group  sizes  less  than  seven  cannot  be 
used  since  the  number  of  first  stage  runs  must  necessarily  exceed  20  runs. 
For  group  sizes  of  seven  or  larger,  the  total  number  of  runs  required 
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by  both  stages  of  screening  will  exceed  20  if  even  one  group  is  carried 
over  to  the  second  stage.  Consequently,  in  order  to  have  E(N)<20  runs, 
a,  must  be  extremely  small,  ensuring  that  the  expected  number  of  signi¬ 
ficant  groups  is  less  than  one.  This  results  in  very  low  power,  no 
matter  how  large  an  effect  might  be. 

As  mentioned  previously,  we  have  purposely  included  less  negative 
effects  in  our  case  study  than  positive  effects,  with  the  proportion  of 
negative  effects  being  a  decreasing  function  of  magnitude.  For  very 
small  effects,  this  proportion  is  nearly  50%,  and  the  GS  power  curves 
show  that  small  positive  and  negative  effects  have  nearly  the  same  prob¬ 
ability  of  being  detected.  For  larger  effects,  however,  a  smaller  pro¬ 
portion  of  factor  effects  are  negative.  Thus,  negative  effects  have  a 
greater  chance  of  being  grouped  together  with  positive  effects,  resulting 
in  cancellation.  Positive  effects,  on  the  other  hand,  are  more  likely 
to  be  grouped  together,  precluding  cancellation.  As  a  result,  negative 
effects  are  detected  less  frequently  than  positive  effects  of  the  same 
magnitude.  However,  as  can  be  seen  from  Figure  3,  the  effect  of  can¬ 
cellation  is  minimal  in  this  case  study. 

An  important  practical  consideration  which  we  have  not  yet  addressed 
is  that  our  determination  of  the  "optimal"  GS(g,  a^,  a2>  strategies  listed 
in  Table  2  required  prior  knowledge  of  the  effects.  Of  course,  if  one's 
prior  knowledge  is  perfect,  there  is  no  need  for  a  screening  experiment. 
More  realistically,  though,  one's  prior  knowledge  is  never  perfect  and 
some  speculation  is  required.  In  any  event,  it  is  hard  to  see  how  one 
might  reasonably  go  about  choosing  a  GS(g,  ot^,  a  )  strategy  in  the  absence 
of  such  prior  information  or  speculation. 
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To  indicate  the  potential  effects  of  imprecise  prior  knowledge 


consider  a  GS(g,  ct ^ ,  a^)  strategy  chosen  with  a  specific  distribution 
of  effects  in  mind.  If  this  set  of  effects  does  not  closely  approxi¬ 
mate  the  true  situation,  the  chosen  group  size  may  not  be  optimum. 
Furthermore,  and  a ^  may  be  misspecif ied,  so  that  RTC  and  type  I 
error  deviate  from  their  desired  values.  We  refer  to  Mauro  (1983b) 
for  a  study  of  this  problem  in  the  special  case  where  each  individual 

factor  has  an  effect  of  -Ao  ,  Ao  ,  or  0. 

£  £ 

An  additional  consideration  in  the  use  of  a  GS  strategy  is  that 
the  number  of  test  runs  is  a  random  variable.  In  this  paper  we  have 
restricted  ourselves,  somewhat  arbitrarily,  to  looking  at  the  expected 
number  of  runs.  From  a  practical  standpoint,  however,  this  can  be 
rather  disconcerting  to  the  researcher  contemplating  a  group  screening 
experiment.  In  order  to  evaluate  the  severity  of  this  problem,  we  have 
estimated  the  standard  deviation  of  the  number  of  runs  for  each  of  the 
four  cases  considered.  These  quantities  (S^)  are  given  in  Table  2  and 
can  be  seen  to  be  very  large  compared  with  RTC.  In  Figure  A  we  present 
a  histogram  of  the  number  of  runs  needed  in  each  of  the  10,000  simu¬ 
lations  of  the  GS(5,  0.002,  0.1667)  strategy  (60%  RTC  and  5%  type  I 
error).  Inspection  of  this  histogram  shows  that  in  10%  of  the  simu¬ 
lations  the  number  of  runs  was  8A  or  greater. 

Since  an  experimenter  might  be  reluctant  to  use  a  strategy  which 
does  not  allow  him  to  predetermine  his  testing  cost,  we  have  considered 
a  modified  version  of  group  screening,  where  the  number  of  runs  is 
fixed.  The  experimenter  decides  beforehand  the  number  of  groups,  say 
m,  he  is  willing  to  carry  over  to  the  second  stage.  After  the  first- 
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.  V 


stage  experiment,  the  m  groups  with  the  largest  estimated  effects  are 
chosen  and  their  component  factors  tested  individually  in  a  second-stage 
experiment.  This  of  course  implies  a  random  first-stage  significance 
level.  However,  the  overall  type  1  error  can  be  controlled  by  suitably 
adjusting  the  second-stage  significance  level.  We  have  done  some  pre¬ 
liminary  investigation  of  this  type  of  strategy  and  our  results  indicate 
that  its  performance  is  comparable  to  that  of  standard  group  screening. 
However,  further  work  needs  to  be  done,  including  the  investigation  of 
hybrid  strategies  where  only  the  maximum  number  of  runs  is  specified, 
allowing  a  smaller  experiment  if  such  seems  justified  by  the  first-stage 
results. 
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6.  SUMMARY 


In  this  paper  we  have  focused  on  the  problem  of  supersaturated 
screening  experiments.  We  have  restricted  our  ttention  to  two  basic 
screening  methods:  random  balance  and  two-stage  group  screening.  Our 
primary  observations  are  illustrated  by  means  of  a  case  study  in 
which  K=100  factors  are  screened  in  N"20,42,62,  and  84  runs.  A  com¬ 
parison  of  power  curves  showed  that  group  screening  had  much  greater 
power  in  N»62  and  84  runs.  Random  balance  had  slightly  greater  power 
for  detecting  large  effects  in  ^42  runs.  When  N*20  runs,  group 
screening  had  only  a  25%  chance  of  detecting  the  largest  effect  (9.57 o^) 
in  the  model  while  random  balance  had  a  95%  chance  of  detecting  that 
effect. 

In  addition  to  a  comparison  of  power,  we  discuss  the  relative 
merits  and  demerits  of  each  strategy.  Our  findings  indicate  that 
unless  there  is  a  severe  limitation  on  the  number  of  runs  group  screening 
appears  to  be  the  better  strategy.  However,  there  are  two  primary  draw¬ 
backs  to  the  use  of  group  screening.  First,  the  total  number  of  runs 
is  a  random  variable.  Second,  one  needs  a  certain  amount  of  prior  in¬ 
formation  to  choose  an  appropriate  group  screening  strategy. 
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Probability  Density  Function  of  a  Gamma  Distribution 


Figure  2  (continued).  RB  Power  Curves:  Solid  Curve  Is  For  5% 

Type  I  Error,  Dashed  Curve  Is  For  10% 
Type  I  Error. 


*  M  0  l  U  T  t  trf  tci 


Figure  3 


«  0 


(b)  E(N)-42 

GS  Power  Curves  for  5%  Type  I  Error:  Solid  Curve  Is  For 
Positive  Effects,  Dots  Are  For  Negative  Effects. 
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APPENDIX 


The  random  variables  e^,  ...»  ef  are  said  to  be  exchangeably 

distributed  if  the  joint  distribution  of  e^j,  e^*  ••*»  &nr  is  the 
same  as  the  joint  distribution  of  e^»  e^*  ...»  e^  for  all  permutations 
it  of  (1,  2»  ...»  r) . 

In  the  ordinary  linear  model  (OLM)  we  assume  that  the  error  terms 
are  i.i.d.  normal  random  variables.  In  the  exchangeable  linear  model 
(ELM)  we  assume  exchangeably  normally  distributed  errors.  Following 
Arnold  (1981;  pp.  232-238),  the  ELM  is  equivalently  the  model  in  which 
we  observe  Y  -  N^(u,02A(p)) ,  where  Ji  is  an  rxl  mean  vector  and  A(p) 
has  the  following  form 


A(p) 


ip  . . .  p 

pi  . . .  p 


P  P 


A  key  result  derived  by  Arnold  (1981)  is  that  in  an  ELM  one-way 
analysis  of  variance,  equality  of  level  means  can  be  validly  tested  with 
the  same  F-tests  customarily  used  in  the  OLM.  As  an  aside,  we  note  that 
the  ELM  is  simply  a  repeated  measures  model  with  only  one  individual. 

We  now  proceed  to  show  that  the  RB  model  when  analyzed  with  separate 
F-tests  has  the  same  covariance  structure  as  the  ELM.  In  matrix  nota¬ 
tion,  model  (2.1)  can  be  written  compactly  as  +  +  £  where  J_ 

is  an  Nxl  vector  of  +l's  and  c  is  an  Nxl  vector  (t^,  e^,.*.,  e^) '  of 
i.i.d.  N(0,o*)  error  terms.  We  wish  to  test  the  hypothesis  Hq:8j-0  versus 
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H^B^O  with  a  simple  F-test  (or,  equivalently,  a  two-sample  t-test) 
applied  to  the  observations  at  the  high  (+1)  and  low  (-1)  levels  of 
the  j—  factor.  Without  loss  of  generality,  we  assume  that  the  ob¬ 
servations  are  indexed  so  that  {y^;  i  _<  N/2}  have  x^  ■  +1  and 
(y1;  i  _>  N/2}  have  x^  -  -1. 

Thus,  for  i  <  N/2  we  have  y±  *  BQ  +  3^  +  e±,  and  for  i  >  N/2  we 

K 

have  y  -  Bn  -  B,  +  e,  where  e  -  £  B  x  +  e , .  It  can  readily  be 

i  U  j  1  1  m  xm  i 

shown  that  £  *  (y^  y2 . yN> '  has  mean  vector  y  and  variance-co¬ 

variance  matrix  Z.  given  by 

y  *  (Bq  +  Bj,  •••»  6Q  +  Bj,  Bq  -  Bj . BQ  -  B^)', 


where  T2  *  IB2  and  p 
j  m*j  m 


l  “  (t2  +  o2)A(p), 
*  -Tj/[H-l)(T*+0*)]. 


We  see,  then,  that  the  RB  model  has  the  same  covariance  structure 
as  the  ELM  defined  earlier,  setting  a2  -  T2  +  o2.  The  only  difference 
between  the  two  models  is  that  the  errors  (e^)  the  RB  model  are  not 
precisely  joint  normal.  We  suspect,  however,  that  this  violation  has 
little  effect  on  the  F-test  for  two  reasons:  (1)  Arnold  (1983)  has 
demonstrated  asymptotic  validity  against  nonnormality  for  tests  of  this 
type  for  the  repeated  measures  model,  of  which  the  ELM  is  a  special 
case,  (2)  Nonnormality  generally  has  a  small  effect  on  tests  about 
means  in  the  presence  of  balanced  sampling,  zero  skewness,  and  zero 


kurtosis.  In  the  RB  model,  each  e,  has  zero  skewness  and  kurtosis  given 
K 


by  -2 


Z  fi1*/^2  +  02)5  which  is  clearly  dominated  by  the  term  in  the 
m#j  ”  j  £ 
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denominator. 

Using  results  derived  by  Arnold  (1981)  for  the  ELM,  it  can  be  shown 
that  the  appropriate  noncentrality  parameter  for  our  testing  problem  is 
6  «  N8j/|(Tj  +  o*)(l-p)J,  where  p  is  as  defined  previously.  Accordingly, 
an  approximation  to  the  power  of  an  RB(N,ar)  strategy  for  detecting  the 
j—  factor  is  given  by  the  expression: 

Power  -  P{F*>F(l-ar;l,N-2)} 

where  F*  has  a  noncentral  F-distribution  with  (l,N-2)  degrees  of  freedom 
and  noncentrality  parameter  6. 
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